Data and Variables: This project examines the relationship between extreme poverty and fertility rates using Gapminder data: Children per Woman (Total Fertility Rate): The average number of children a woman has and Extreme Poverty Rate: The percentage of people living below the international poverty line.
Hypothesis: We hypothesize that higher poverty rates correlate with higher fertility rates due to economic necessity and limited access to contraception. Declining poverty is expected to lower fertility due to improved education, healthcare, and economic opportunities.
Data Cleaning Steps: 1. Reshaped Data: Converted from wide to long format using pivot_longer(). 2. Merged Datasets: Joined on country and year. 3. Handled Missing Values: Removed incomplete observations with drop_na().
Considerations: There is limited historical data. As well, other socioeconomic factors can also impact fertility but are not included.
The animated visualization illustrates how extreme poverty and fertility rates change over time. The animation reveals a general downward trend in both variables, indicating that as time progresses, fertility rates and poverty levels tend to decrease. This supports the hypothesis that economic development and improved access to resources contribute to lower fertility rates.
The scatterplot demonstrates a positive relationship between extreme poverty rates and fertility rates across countries. Countries with higher poverty rates tend to have higher fertility rates, whereas countries with lower poverty rates tend to have fewer children per woman. The spread of points suggests some variability, but the overall trend indicates that poverty is a significant influence on fertility.
The estimated regression equation is: “Fertility_Rate = 2.981 + 0.036 * Poverty_Rate” where: 2.981 (intercept) represents the predicted fertility rate when the extreme poverty rate is zero. 0.036 (slope) represents the expected increase in fertility rate for a increase in extreme poverty rate. The intercept suggests that even in the absence of extreme poverty, fertility rates are still about 3 children. The slope coefficient indicates that as extreme poverty increases by 1 percentage, the fertility rate is expected to increase by 0.036 children per woman.
The proportion of variability (R^2) is calculated by 0.3640/0.8558 = 0.425. This suggests that about 42.5% of the variation in fertility rates can be explained by extreme poverty. The remaining variation is most likely influenced by other socioeconomic and cultural factors not included in this model.
---title: "The Effect of Extreme Poverty on Womens Fertility Rates"author: "Ainsley Forster, Sophia Taylor"date: "`r Sys.Date()`"format: htmltheme: cosmoembed-resources: truecode-tools: truetoc: trueeditor: sourceexecute: self-contained: true echo: true code-fold: true message: false warning: false---## Introduction```{r}#| label: load-and-clean-data#| code-fold: true#| code-summary: "Show load-and-clean-data code"library(tidyverse)library(readr)library(gganimate)library(gifski)library(knitr)library(gridExtra)data_children_per_woman <-read_csv("children_per_woman_total_fertility.csv") |>pivot_longer(cols ="1800":"2100",names_to ="year",values_to ="children_per_woman")data_extreme_poverty_rate <-read_csv("gm_epov_rate.csv") |>pivot_longer(cols ="1800":"2100",names_to ="year",values_to ="extreme_poverty_rate")clean_data <- data_children_per_woman |>full_join(data_extreme_poverty_rate, by =c("country", "year")) |>drop_na()```Data and Variables:This project examines the relationship between extreme poverty and fertility rates using Gapminder data:Children per Woman (Total Fertility Rate): The average number of children a woman has and Extreme Poverty Rate: The percentage of people living below the international poverty line.Hypothesis:We hypothesize that higher poverty rates correlate with higher fertility rates due to economic necessity and limited access to contraception. Declining poverty is expected to lower fertility due to improved education, healthcare, and economic opportunities.Data Cleaning Steps:1. Reshaped Data: Converted from wide to long format using pivot_longer().2. Merged Datasets: Joined on country and year.3. Handled Missing Values: Removed incomplete observations with drop_na().Considerations:There is limited historical data. As well, other socioeconomic factors can also impact fertility but are not included.## Data Visualization```{r}#| label: poverty-and-fertility-over-time#| code-fold: true#| code-summary: "Show poverty-and-fertility-over-time code"poverty_and_fertility_over_time <- clean_data |>ggplot(mapping =aes(x = extreme_poverty_rate,y = children_per_woman) ) +geom_point(aes(group =seq_along(year))) +transition_states(year,transition_length =1,state_length =1 )+theme_light()+labs(title ="Extreme Poverty and Womens Fertility Over Time",subtitle ="Fertility Rate (children per woman)",x ="Extreme Poverty Rate (%)",y ="")+geom_text(aes(x =85, y =1.5, label =paste("Year:", year)), size =5)animate(poverty_and_fertility_over_time)```The animated visualization illustrates how extreme poverty and fertility rates change over time. The animation reveals a general downward trend in both variables, indicating that as time progresses, fertility rates and poverty levels tend to decrease. This supports the hypothesis that economic development and improved access to resources contribute to lower fertility rates.```{r}#| label: poverty-vs-fertility-rates#| code-fold: true#| code-summary: "Show poverty-vs-fertility-rates code"clean_data2 <- clean_data |>group_by(country) |>summarise(mean_fertility =mean(children_per_woman),mean_poverty =mean(extreme_poverty_rate) )poverty_vs_fertility_rates <- clean_data2 |>ggplot(mapping =aes(x = mean_poverty,y = mean_fertility) ) +geom_point() +theme_light()+labs(title ="Extreme Poverty vs. Womens Fertility",subtitle ="Fertility Rate (children per woman)",x ="Extreme Poverty Rate (%)",y ="")plot(poverty_vs_fertility_rates)```The scatterplot demonstrates a positive relationship between extreme poverty rates and fertility rates across countries. Countries with higher poverty rates tend to have higher fertility rates, whereas countries with lower poverty rates tend to have fewer children per woman. The spread of points suggests some variability, but the overall trend indicates that poverty is a significant influence on fertility.## Linear Regression```{r}#| label: SLR#| code-fold: true#| code-summary: "Show SLR code"slr_model =lm(mean_fertility ~ mean_poverty, data = clean_data2)print(paste("Fertility_Rate =",round(coef(slr_model)[1], 3),"+",round(coef(slr_model)[2], 3),"* Poverty_Rate") )```The estimated regression equation is: "Fertility_Rate = 2.981 + 0.036 * Poverty_Rate"where: 2.981 (intercept) represents the predicted fertility rate when the extreme poverty rate is zero.0.036 (slope) represents the expected increase in fertility rate for a increase in extreme poverty rate. The intercept suggests that even in the absence of extreme poverty, fertility rates are still about 3 children. The slope coefficient indicates that as extreme poverty increases by 1 percentage, the fertility rate is expected to increase by 0.036 children per woman.## Model Fit```{r}#| label: variation-table#| code-fold: true#| code-summary: "Show variation-table code"var_in_response <-var(clean_data2$mean_fertility)var_in_fitted <-var(fitted(slr_model))var_in_residuals <-var(slr_model$residuals)variation_table <-data.frame("Variance Type"=c("Response", "Fitted", "Residuals"),"Variance Value"=c(var_in_response, var_in_fitted, var_in_residuals) )print(variation_table)```The proportion of variability (R^2) is calculated by 0.3640/0.8558 = 0.425. This suggests that about 42.5% of the variation in fertility rates can be explained by extreme poverty. The remaining variation is most likely influenced by other socioeconomic and cultural factors not included in this model.## Visualising Model Simulations```{r, (1,2)}#| label: simulated-data-comparison#| code-fold: true#| code-summary: "Show simulated-data-comparison code"set.seed(33498)simulated_data <- predict(object = slr_model)+ rnorm(192, mean = 0, sd = sigma(slr_model))simulated_poverty_vs_fertility_rates <- clean_data2 |> mutate(simulated_fertility = simulated_data) |> ggplot(mapping = aes(x = mean_poverty, y = simulated_fertility) ) + geom_point() + theme_light()+ labs(title = "Extreme Poverty vs. Simulated Womens Fertility", subtitle = "Simulated Fertility Rate (children per woman)", x = "Extreme Poverty Rate (%)", y = "")grid.arrange(poverty_vs_fertility_rates, simulated_poverty_vs_fertility_rates, ncol = 2)```## Generating Multiple Predictive Checks```{r}#| label: simulated-datasets-regression-comparison#| code-fold: true#| code-summary: "Show simulated-datasets-regression-comparison code"set.seed(33498)generate_and_regress <-function(observed = clean_data2) { simulated_data <-predict(object = slr_model) +rnorm(192, mean =0, sd =sigma(slr_model)) simulated_dataset <- observed |>mutate(simulated_fertility = simulated_data) regression <-lm(simulated_dataset$mean_fertility ~ simulated_dataset$simulated_fertility) summary <-summary(regression)return(summary$r.squared)}simulated_rsquared_values <-tibble(person =1:1000,rsquared =map_dbl(.x =1:1000, .f =~generate_and_regress() ) ) |>ggplot(aes(x = rsquared))+geom_histogram()+labs(title ="Distribution of Regression on Observed vs. Simulated Datasets",subtitle ="Counts",x ="R-Squared Values",y ="")+theme_light()plot(simulated_rsquared_values)```## Conclusion